iTechX

Important:

All answers need to be round to 3 decimals, except the last problem, which needs the accurate answer.

Question 1

?/? point (graded)

In this question, we will train a Naive Bayes classifier to predict class labels Y as a function of input features $F_i$ .

We are given the following 15 training points:

What is the maximum likelihood estimate of the prior P(Y)?

Y	P(Y)
A	[q1.1]
B	[q1.2]
C	[q1.3]

What are the maximum likelihood estimates of the conditional probability distributions? Fill in the tables below (the second and third are done for you).

$F_1$	Y	$P(F_1\|Y)$
0	A	[q1.4]
1	A	[q1.5]
0	B	[q1.6]
1	B	[q1.7]
0	C	[q1.8]
1	C	[q1.9]

$F_2$	Y	$P(F_2\|Y)$
0	A	1.000
1	A	0.000
0	B	0.222
1	B	0.778
0	C	0.250
1	C	0.750

$F_3$	Y	$P(F_3\|Y)$
0	A	0.500
1	A	0.500
0	B	0.000
1	B	1.000
0	C	0.500
1	C	0.500

q1.1 =

q1.2 =

q1.3 =

q1.4 =

q1.5 =

q1.6 =

q1.7 =

q1.8 =

q1.9 =

Question 2

?/? point (graded)

Following question 1, Now consider a new data point $(F_1=0,F_2=0,F_3=1)$ . Use your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:

Y	$P(Y,F_1=0,F_2=0,F_3=1)$
A	[q2.1]
B	[q2.2]
C	[q2.3]

Y	$P(Y\|F_1=0,F_2=0,F_3=1)$
A	[q2.4]
B	[q2.5]
C	[q2.6]

What label does your classifier give to the new data point? (Break ties alphabetically). Enter capital letters only

[q2.7]

The training data is repeated here for your convenience:

q2.1 =

q2.2 =

q2.3 =

q2.4 =

q2.5 =

q2.6 =

q2.7 =

Question 3

?/? point (graded)

Following the previous questions, now use Laplace Smoothing with strength k = 3 to estimate the prior P(Y) for the same data.

Y	P(Y)
A	[q3.1]
B	[q3.2]
C	[q3.3]

Use Laplace Smoothing with strength k = 3 to estimate the conditional probability distributions below (again, the second two are done for you).

$F_1$	Y	$P(F_1\|Y)$
0	A	[q3.4]
1	A	[q3.5]
0	B	[q3.6]
1	B	[q3.7]
0	C	[q3.8]
1	C	[q3.9]

$F_2$	Y	$P(F_2\|Y)$
0	A	0.625
1	A	0.375
0	B	0.333
1	B	0.667
0	C	0.400
1	C	0.600

$F_3$	Y	$P(F_3\|Y)$
0	A	0.500
1	A	0.500
0	B	0.200
1	B	0.800
0	C	0.500
1	C	0.500

q3.1 =

q3.2 =

q3.3 =

q3.4 =

q3.5 =

q3.6 =

q3.7 =

q3.8 =

q3.9 =

Question 4

?/? point (graded)

Now consider again the new data point $(F_1=0,F_2=0,F_3=1)$ . Use the Laplace-Smoothed version of your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:

Y	$P(Y,F_1=0,F_2=0,F_3=1)$
A	[q4.1]
B	[q4.2]
C	[q4.3]

Y	$P(Y\|F_1=0,F_2=0,F_3=1)$
A	[q4.4]
B	[q4.5]
C	[q4.6]

What label does your (Laplace-Smoothed) classifier give to the new data point? (Break ties alphabetically). Enter a single capital letter.

[q4.7]

q4.1 =

q4.2 =

q4.3 =

q4.4 =

q4.5 =

q4.6 =

q4.7 =

Question 5

?/? point (graded)

Consider a context-free grammar with the following rules (assume that S is the start symbol):

S → NP VP

NP → DT NN

NP → NP PP

PP → IN NP

VP → VB NP

DT → the

NN → man

NN → dog

NN → cat

NN → park

VB → saw

IN → in

IN → with

IN → under

How many parse trees are there under this grammar for the sentence: the man saw the dog in the park?

Question 6

?/? point (graded)

Following the previous question, How many parse trees for the sentence: the man saw the dog in the park with the cat?

Question 7

?/? point (graded)

Consider the following PCFG (probabilities for each rule are shown after the rule):

S → NP VP 1.0

PP → P NP 1.0

VP → V NP 0.6

VP → VP PP 0.4

P → with 0.8

P → in 0.2

V → saw 0.7

V → look 0.3

NP → NP PP 0.3

NP → Astronomers 0.12

NP → ears 0.18

NP → saw 0.02

NP → stars 0.18

NP → telescopes 0.2

What is the probability of the best parse tree for the sentence: Astronomers saw stars with ears?

Question 8

?/? point (graded)

Which of the following are true of convolutional neural networks (CNNs) for image analysis?

Question 9

?/? point (graded)

Lasso can be interpreted as least-squares linear regression where

Question 10

?/? point (graded)

Suppose we are given data comprising points of several different classes. Each class has a different probability distribution from which the sample points are drawn. We do not have the class labels. We use k-means clustering to try to guess the classes. Which of the following circumstances would undermine its effectiveness?

Homework 5